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Abstract 


\ 

A  concurrency  control  mechanism  (or  a  scheduler)  is  the  component  of  a  database  system  that  safeguards 
the  consistency  of  the  database  in  the  presence  of  interleaved  accesses  and  update  requests.  We  formally 
show  that  the  performance  of  a  scheduler,  i.e.,  the  amount  of  parallelism  that  it  supports,  depends  explicitly 
upon  the  amount  of  information  that  is  available  to  the  scheduler.  We  point  out  that  most  previous  work  on 
concurrency  control  is  simply  concerned  with  specific  points  of  this  basic  trade-ofF  between  performance  and 
information.  In  fact,  several  of  these  approaches  are  shown  to  be  optimal  for  the  amount  of  information  that 
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1 .  Introduction 

A  database  system  may  interact  with  many  transactions  in  an  interleaved  manner.  Even  if  we  assume  that 
each  such  individual  transaction  is  correct  (that  is,  it  preserves  the  consistency  of  the  databases  when  run  by 
itself),  the  interleaved  mode  of  operation  may  result  in  inconsistencies  (see,  for  example,  [2]).  It  is  the  task  of 
the  concurrency  control  mechanism  of  the  database  system,  called  scheduler  in  this  paper,  to  safeguard  the 
consistency  of  the  database  by  granting  or  rejecting  the  execution  of  atomic  steps  of  transactions,  when 
requests  for  such  executions  are  made. 

The  design  of  schedulers  for  databases  has  proved  to  be  a  non-trivial  problem,  and  some  theoretical  work 
on  the  subject  has  appeared  (sec,  for  example,  12,  6,  7,  9]).  Several  solutions  to  this  problem  have  been 
proposed  under  a  variety  of  assumptions.  In  this  paper,  we  give  a  uniform  framework  for  evaluating  these 
solutions,  and,  in  some  cases,  for  establishing  their  optimality.  A  scheduler  is  evaluated  in  terms  of  its 
performance,  which  is  measured  by  the  set  of  request  sequences  that  the  scheduler  can  authorize  for  execution 
without  any  delay.  This  set  of  request  sequences  is  called  the  fixpoint  set  of  the  scheduler.  The  idea  is  that  the 
richer  this  set  is,  the  more  likely  that  no  delays  will  be  imposed  by  the  scheduler.  In  this  sense  the  fixpoint  set 
is  a  fair  measure  of  the  parallelism  supported  by  the  scheduler,  and  therefore  of  its  performance. 

We  observe  that  there  is  a  trade-off  between  scheduler  performance  and  the  information  used  by  the 
scheduler.  The  latter  is  the  minimum  knowledge  about  the  database  and  the  transactions  that  the  scheduler 
requires  in  order  to  function  correctly.  Typical  information  that  could  be  useful  to  the  scheduler  is  syntactic 
information  about  the  transactions  (that  is,  a  flowchart  with  the  names  of  the  database  entities  accessed  and 
updated  at  each  step);  or  semantic  information  about  the  meaning  of  the  data  and  the  operations  performed; 
or  the  integrity  constraints ,  the  consistency  requirements  that  the  database  must  satisfy.  Ideally,  a  scheduler 
would  like  to  have  a  perfect  knowledge  of  all  these  three  components  of  information.  It  is  usually  necessary, 
however,  to  have  the  scheduler  operate  at  some  imperfect  level  of  information.  There  are  many  reasons  for 
this.  Some  information  (e.g.,  integrity  constraints)  may  not  be  known  explicitly  even  to  the  designer  of  the 
database.  If  semantic  information  is  given  in  some  powerful  enough  language  (e.g.,  arithmetic)  then  it  may 
not  be  possible  to  reason  about  it  effectively.  Finally,  to  utilize  sophisticated  information  may  render  the 
scheduling  problem  combinatorially  intractable  --  sec  [6]  for  a  case  in  which  the  ability  of  simply 
distinguishing  between  read  and  write  operations  makes  the  problem  NP-coinplctc.  It  should  be  intuitively 
clear  that  the  more  information  the  scheduler  has,  the  better  job  it  can  do  in  enriching  its  fixpoint  set,  and 
therefore  increasing  its  performance.  We  capture  this  intuitive  tradc-off  in  an  equation  (Theorems  3.1)  and 
exhibit  several  specific  instances  for  which  well  known  concurrency  principles  correspond  to  optimal 
schedulers  (optimal  with  respect  to  the  information  that  they  use).  For  example,  in  our  framework  we  can 
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formally  show  that  serializability  (which  has  been  adapted  in  an  ad  hoc  manner  in  virtually  all  the 
concurrency  control  literature)  is  indeed  the  right  notion  of  correctness  when  only  syntactic  information  is 
available  (as  is  usually  the  case).  If  semantic  or  integrity  information  is  available,  then  more  liberal 
correctness  criteria  may  be  used  (see,  for  example,  [3, 4]).  We  also  prove  that  some  strict  version  of  the  two- 
phase  locking  technique  of  [2]  is  the  best  possible  principle  when  syntactic  information  is  acquired  in  an 
incremental  manner. 

The  paper  is  organized  as  follows.  In  Section  2  we  introduce  our  model  for  transaction  systems,  carefully 
distinguishing  among  the  syntactic,  semantic,  and  integrity  constraint  components.  In  Section  3  we  formally 
introduce  the  notion  of  schedulers,  and  develop  the  basic  tools  for  studying  the  information  vs.  performance 
trade-off.  Specific  examples  of  optimal  schedulers  are  presented  in  Section  4. 
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2.  Transaction  Systems 

2.1  Definition  of  a  T ransaction  System 

By  a  transaction  system  we  mean  a  database  (that  is,  data  and  integrity  constraints)  together  with  a  set  of 
statically  prcspecificd  transaction  programs.  A  transaction  system  can  be  formally  defined  in  terms  of  three 
components:  syntax,  semantics,  and  integrity  constraints. 


2.1.1.  Syntax 

A  transaction  system  T  is  a  finite  set  of  transactions ,  (Tj . TJ,  n  >  l.where  each  transaction  T.  is  a  finite 

sequence  of  transaction  steps,  T,t . Tjm .  The  n-tuple  of  integers  (mr  ....  mn)  is  called  the  formal  of  the 

transaction  system.  For  simplicity,  wc  assume  that  all  transaction  systems  under  consideration  have  the  same, 
fixed  format. 


The  transactions  in  a  transaction  system  operate  on  a  set  of  variable  names.  The  variables  are  abstractions  of 
data  entities,  whose  granularity  is  not  important  for  our  development.  The  variables  can  represent  bits,  files 
or  records,  as  long  as  they  are  individually  accessible.  The  set  of  variable  names  is  denoted  by  V.  Besides  the 

(global)  variables  in  V,  each  transaction  T.  is  associated  with  local  variables,  ta . tm.  A  transaction  step  Ty 

in  T,  can  be  thought  of  as  the  indivisible  execution  of  the  following  two  instructions: 

. V’ 

where  F  is  a  j-place  function  symbol.  That  is  to  say,  at  step  Ty  the  current  value  of  some  global  variable  xy  €  V 
is  stored  at  a  local  place  t,.  and  then  xy  is  assigned  a  new  value,  based  on  function  F  and  knowledge  available 

to  the  transaction  T.  at  this  time,  namely,  the  values  of  all  "declared"  local  variables  ta . ty.  The  meaning  of 

function  f.  is  open  to  arbitrary  interpretations  at  this  point  For  example,  it  could  be  the  identity  function  on 

t  in  which  case  Ty  is  simply  a  read  step.  Similarly,  if  all  f^ . tj  with  k  >  j  are  independent  of  then  Ty 

is  a  write  step. 
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2.1.2.  Semantics 

Associated  with  each  variable  name  v  €  V  we  have  an  enumerable  set  D(v),  the  domain  of  v,  consisting  of 
all  possible  values  that  the  variable  v  can  assume  --  typically  the  integers,  the  Boolean  values,  or  finite  strings. 
A  local  variable  t^  has  always  the  same  domain  as  xy. 

A  state  of  a  transaction  system  T  is  a  triple  (J,  L,  G),  where 

•  J  is  an  n-tuplc  of  integers  (jr  . . .  jn)  with  j.,  (1  <  j.  <  nr+1),  specifying  the  next  step  of  transaction 
Tr  The  j.’s  are  thus  program  counters.  If  jj  -  m.+],  then  transaction  T.  has  terminated. 

•  L  is  an  element  in  ni^n(nisj<..D(xij))  representing  the  values  of  all  declared  local  variables. 

•  G  is  an  element  in  ny£VD(v)  representing  the  current  values  of  all  global  variables  v  €  V. 

The  semantics  of  T  associate  with  the  function  symbol  fy  at  each  step  Ty  a  function 
<ptj :  rij^IXx*)  — »  D(Xjj),  which  is  the  interpretation  of  r.  Thus  the  execution  of  a  transaction  step  maps 
one  state  of  the  transaction  system  into  another  one.  More  precisely,  if  transaction  step  Ty  is  eligible  for 
execution  at  state  (J,  L,  G),  that  is,  if  jt  £  mj(  then  its  execution  modifies  the  three  components  of  the  state  as 
follows: 

ii-ii  +  i. 

Xij  . V* 

This  view  of  single  transaction  steps  can  be  extended  to  sequences  of  transaction  steps  in  the  obvious  way. 

2.1 .3.  Integrity  Constraints 

The  integrity  constraints  of  a  transaction  system  T  correspond  to  a  subset  IC  of  the  product  riv£VD(v).  A 
state  (J,  I-,  G)  of  T  is  said  to  be  consistent  if  G  belongs  to  IC.  A  sequence  of  transaction  steps  is  said  to  be 
correct  if  a  serial  execution  of  the  steps  in  the  sequence  will  map  any  consistent  state  of  the  transaction  system 
into  a  consistent  state. 

The  basic  assumption  throughout  the  paper  is  that  all  transactions  in  a  transaction  system  are  correct. 
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2.2  Example 

Consider  a  transaction  system  consisting  of  three  transactions  Tv  T2,  and  T},  that  access  two  banking  . 
accounts  A  and  B  in  the  following  way: 

•  Tj  transfers  $100  from  A  to  B  if  A  has  enough  funds  and  the  balance  of  B  is  below  $100. 

•  1\  withdraws  $50  from  B  and  increments  a  counter  C,  if  B  has  enough  funds. 

•  T3  is  an  auditing  transaction  that  computes  the  sum  S  of  A  and  B,  and  sets  the  counter  C  back  to  0. 


Syntax.  The  set  of  global  variable  names  is  V  *  {A,  B,  S,  C}.  The  x^s  are  as  follows: 


'll 


21 


A,  xn  =  B,  x13  *  A 

C, 


B,x22- 
Xji  =  A,  x32 

Thus  the  format  of  the  transaction  system  is  (3,  2, 4). 


B,  x33  -  S, 


Semantics.  For  all  v  c  V,  D(v)  is  the  set  of  natural  numbers.  Typical  states  would  be  as  follows: 

•  (J,  I-  G)  -  ((1, 1,  IX  *,  (150,  50,  200,  0)).  This  is  a  possible  state  before  any  of  the  transactions  has 
started  execution.  We  have  A  «  $150,  B  $50,  S  -  $200,  C  -  0,  and  don’t  care  about  the  values  of 
local  variables. 

•  (J,  L  G)  -  ((2,  2,  4X  (ISO:  50:  150,  0,  200),  (150,  0, 150,  0)).  In  this  state,  A  has  not  been  decreased 
but  B  has.  The  new  S  has  been  computed  but  C  has  not. 

As  for  the  operations  performed  by  each  step: 

9n  -  hi 

<p12  *  ijtn  >  100  and  <  100  then  +  100  else  tn 
<p  i3  ®  iftn  >  100  andln  <  100  then  -  100  else  t11 

cp2]  =  if  hi  >  50  then  t^  -  50  else  hi 
<pn  -  if  hi  >  50  then  h2  +  1  else  t^ 

*31  “  Hi 

V32  mHa 

*33  ”  Hi +  Hz 
934-0 


The  integrity  constraints  may  very  well  be  the  set  of  states  for  which  A  ^  0,  B  £  0,  and  A  +  B  -  S  -  50C. 
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3.  Schedules  and  Schedulers 

3.1  Schedules 

A  schedule  of  a  transaction  system  T  is  a  permutation  of  the  set  {T^:  1  £  i  <  n,  l  <  j  <  m.)  of  steps  in  T 
such  that  in  the  permutation  comes  before  Tjk  for  j  <  k.  We  may  think  of  a  schedule  as  a  possible  stream  of 
arriving  execution  requests,  or,  in  a  different  context,  as  a  sequence  of  transaction  steps  that  defines  the  order 
in  which  these  execution  requests  arc  granted  execution.  The  set  of  all  schedules  of  T  is  denoted  by  H(T). 
Since  this  set  depends  only  on  the  format  of  T  and  the  format  is  assumed  fixed,  we  shall  write  H  for  H(T).  A 
schedule  is  said  to  be  correct  if  its  execution  preserves  the  consistency  of  the  database.  The  set  of  all  correct 
schedules  of  T  is  denoted  by  C(T).  The  set  C(T)  is  always  nonempty,  since  it  at  least  contains  (by  our  basic 
assumption  that  all  transactions  arc  correct)  all  serial  schedules,  that  is,  all  permutations  rr  such  that 
w(Ty+1)  -  wfTy)  + 1  for  1  <  i  <  n  and  j  <  m,-l. 

3.2  Schedulers 

A  scheduler  (or  concurrency  control  mechanism)  transforms  a  stream  of  execution  requests  into  a  correct 
schedule.  This  is  achieved  by  properly  granting  or  rejecting  the  execution  of  arriving  requests.  (A  rejected 
request  is  rescheduled  for  execution  at  some  later  time.)  Thus,  a  scheduler  for  a  transaction  system  T  can  be 
viewed  as  a  mapping  S  from  H  to  C(T). 

We  measure  the  performance  of  a  scheduler  S  by  its  fixpoint  set  Ps,  defined  as 
P$  -  {h  €  H:  S(h)  -  h}. 

Clearly  P$  must  be  a  subset  of  C(T).  The  larger  Ps  is,  the  more  improbable  it  is  that  S  will  have  to  delay  (or 
reject)  the  execution  of  a  transaction  step,  after  such  an  execution  is  requested.  We  therefore  consider  the 
inclusion-induced  partial  order  on  the  sets  Ps  as  a  "qualitative"  measure  of  scheduler  performance. 

3.3  Information 

A  level  of  information  available  to  a  scheduler  S  about  a  transaction  system  T  is  defined  to  be  a  set  1  of 
transaction  systems  (T,  T,  T, ...}  that  contains  T.  Intuitively,  if  S  is  kept  at  this  level  of  information,  it  knows 
that  the  transaction  system  in  question  is  among  the  transaction  systems  in  I,  but  docs  not  know  exactly  which. 
Thus,  S  has  to  be  a  scheduler  for  all  transaction  systems  Tel.  For  example,  the  set  I  could  be  the  set  of  all 
transaction  systems  that  have  the  same  syntax.  This  level  of  information  corresponds  to  the  case  that  a 
scheduler  has  complete  syntactic  information,  but  no  other  information. 


»?<y  w»nj<Pftww»' 1  s»jv^|i  ■  ^  *^r>o-  ^  ^  , 
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Alternatively,  we  could  view  1  as  a  function  that  maps  any  transaction  system  T  to  an  object  I(T)  (€  {0, 1}*). 
Intuitively,  I  (T)  is  the  information  extracted  from  T  by  the  operator  /;  for  example,  1(T)  could  be  an  encoding 
of  the  syntax  of  T.  The  effect  would  be  that  T  cannot  be  distinguished  from  the  transaction  systems  V  that 
have  the  same  image  I(T);  in  the  notation  of  the  previous  paragraph,  which  we  are  going  to  follow  henceforth, 
1  -  {T:  I(T)  -  I(T)}. 

The  maximum  possible  information  that  a  scheduler  can  have  is,  of  course,  the  complete  syntactic,  semantic 
and  integrity  information  about  the  transaction  system  in  question;  this  corresponds  to  I  »  {T}.  The 
minimum  information  is  the  format  (mr  ...,  mn);  this  corresponds  to  I  being  the  set  of  all  transaction  systems 
of  the  given  format,  with  the  single  restriction  that  the  transactions  be  correct  -  by  our  basic  assumption.  The 
more  information  available  to  the  scheduler,  the  ‘‘better'’  scheduling  results  may  be  expected.  We  formally 
capture  this  idea  in  the  following  theorem: 

Theorem  3.1 :  For  any  scheduler  S  using  information  I,  the  fixpoint  set  Ps  must  satisfy: 

ps,n„£Ic(n 

The  proof  of  this  theorem  uses  a  general  adversary  argument,  instances  of  which  wc  shall  see  many  times  in 
the  rest  of  the  paper.  The  proof  goes  as  follows.  If  there  is  a  schedule  h  e  P$  and  a  transaction  system  lv  e  1 
such  that  h  is  not  correct  for  V  that  is,  h  (-S(h))  (  C(V),  then  an  adversary  could  “fool”  the  scheduler  S  by 
choosing  T  for  S  to  handle,  and  giving  h  as  the  stream  of  execution  requests.  The  resulting  state  after  the 
execution  can  be  inconsistent,  since  S(h)  t  C(T').  Thus,  the  scheduler  is  incorrect 

As  a  corollary  of  Theorem  3.1,  the  maximum-performance  scheduler  using  information  1  is  the  one  that  has 
its  fixpoint  set  P  -  Ply  a  C(Ty).  We  call  this  scheduler  the  optimal  scheduler  for  the  level  of  information 
I.  (Notice  that  in  practice  there  may  be  insurmountable  difficulties— such  as  the  negative  complexity  results 
in  [6]  —  in  realizing  the  optimal  scheduler  for  a  given  level  of  information.)  The  concept  of  information 
introduced  here  partially  orders  schedulers  with  respect  to  their  sophistication:  we  say  that  S  is  more 
sophisticated  than  S'  if  S  operates  at  a  level  of  information  1  that  is  properly  included  in  the  level  of 
information  1'  of  S',  that  is,  I  c  F.  On  the  other  hand,  schedulers  are  also  partially  ordered  with  respect  to 
their  performance:  we  say  that  S  performs  better  than  S'  if  P-3  Py.  Then  the  mapping  from  any  level  of 
information  I  to  the  fixpoint  set  of  the  optimal  scheduler  for  1, 1  lyt|  C(T'),  is  a  natural  isomorphism  between 
these  two  partially  ordered  sets.  This  captures  the  fundamental  trade-off  between  scheduler  information  and 
performance,  that  is,  if  1  c  I'  then  Py  3  Ps  for  the  optimal  schedulers  S  and  S'  for  I  and  I',  respectively. 

In  the  next  section,  wc  present  several  examples  of  schedulers  that  arc  optimal  for  different  levels  of 
information. 
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4.  Optimal  Schedulers 

4.1  Optimal  Schedulers  for  Extreme  of  Information 
Maximum  Information 

This  is  the  case  when  complete  information  on  the  transaction  system  T  in  question  is  available  to  the 
scheduler.  The  information  level  I  in  this  case  is  a  singleton  set,  i.e.,  I  =  {T}.  We  can  therefore  define  the 
scheduler  S,  in  principle  at  least,  such  that  Ps  =  C(T).  This  is  the  optimal  scheduler  for  the  maximum  level  of 
information. 

Minimum  Information 

If  we  only  know  the  format  of  T,  then  we  have  the  poorest  possible  level  of  information.  What  is  the  best 
possible  scheduler  in  this  case?  Consider  a  serial  scheduler  S  which  is  defined  to  be  a  scheduler  satisfying  the 
following  property  for  any  T: 

S(H)  =  {all  serial  schedules  of  T}  and  P$  =  {all  serial  schedules  of  T}, 
where  serial  schedules  are  defined  in  Section  3.1.  By  our  basic  assumption  that  each  transaction  is  correct,  we 
see  that  each  schedule  in  S(H)  is  correct 

Theorem  4.1 :  The  serial  scheduler  S  is  optimal  among  all  schedulers  using  the  minimum  information. 

Proof.  Suppose  that  S  is  not  optimal.  Then  there  must  exist  a  non-serial  schedule  in  C(T)  in  which  some 
steps  T.k,  Tjj,  T^k+1  in  T  are  executed  in  this  order.  Note  that  because  of  the  minimum  information 
assumption,  I  may  contain  transaction  systems  with  any  integrity  constraints  and  interpretations  for  steps.  We 
assume  that  the  integrity  constraints  for  some  transaction  system  T  in  I  correspond  to  "x=0",  and  that  the 
interpretations  of  function  symbols  are  such  that  T.  is  {T^:  x  «—  x+1,  Tjk+1:  x  «—  x-1}  and  T.  is 
{T^:  x  v-  2x}.  We  see  that  T.  and  T.  are  correct,  but  the  sequence  {T^,  T„,  T  k+1}  is  not  correct  for  it  may 
transform  a  consistent  state,  x=0,  into  an  inconsistent  state,  x  =  l.  Thus,  the  schedule  is  not  in  C(TV).  This 
contradiction  implies  that  for  the  minimum  information  case,  the  only  correct  schedules  that  a  scheduler  can 
produce  are  serial  schedules.  Hence,  the  serial  scheduler  defined  above  is  optimal.  □ 
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4.2  Optimal  Schedulers  for  Complete  Syntactic  Information 

Suppose  now  that  all  syntactic  information  is  available;  that  is,  the  information  level  has  the  property  that  I 
is  the  set  of  all  transaction  systems  with  the  same  syntax.  As  in  a  similar  situation  in  the  theory  of  program 
schemata,  one  can  supplement  this  syntax  with  canonical  semantics  called  Hcrbrand  semantics  (sec  [5]  for  a 
detailed  exposition).  For  all  v  e  V,  the  domain  D(v)  is  the  set  of  all  strings  from  the  alphabet 

2=  V  u  [f. :  i=l . n;j  =  l . mj  plus  the  symbols  If  at . a.  are  elements  of  D(v), 

then  <p(J  (ax . a).  the  interpretation  of  F,  is  the  string  F  (ar  .  .  .,  ap.  In  other  words,  the  Hcrbrand 

interpretation  captures  all  the  history  of  the  values  of  all  global  variables.  We  say  that  a  schedule  h  is 
serializable  if  its  execution  results  are  the  same  as  the  execution  results  of  some  serial  schedule  under  the 
Herbrand  semantics.  Since  serial  schedules  arc  correct,  so  arc  serializable  schedules.  By  SR(T)  we  denote  the 
set  of  all  serializable  schedules  of  T.  A  serialization  scheduler  is  defined  to  be  a  scheduler  S  satisfying  the 
following  property  for  any  T: 

S(H)  =  SR(T)  and  P$  =  SR(T). 

Theorem  4.2:  A  serialization  scheduler  is  optimal  among  all  schedulers  using  complete  syntactic 

information. 

Proof.  To  prove  the  optimality,  for  any  schedule  h  «  SR(T),  we  shall  define  a  transaction  system  V  e  I 
such  that  h  i  C(T').  The  semantics  of  lv  are  the  Herbrand  interpretation.  Now,  for  the  integrity  constraints, 

we  define  1C  as  follows.  Assume  that  T  is  consistent  initially.  Let  (vx . vk)  be  the  initial  values  of  global 

variables  in  V,  where  k  =  |V|.  If  ar  ....  a)t  are  in  D(v),  we  say  that  (ak . ak)  €  IC  iff  there  exists  a  serial 

concatenation  Q  (possibly  empty)  of  some  transactions  in  T'  such  that  the  initial  values  (v.,  ....  vk)  are 
transformed  by  Q  to  (ar  ...  ak).  By  this  definition,  all  transactions  are  individually  correct,  and  our  basic 
assumption  holds.  Now,  it  is  easy  to  see  that,  if  h  is  any  schedule,  not  in  SR(T),  then  it  transforms  the  initial 
values  (Vj . vk)  to  a  set  of  values  not  in  IC.  Hence,  h  €  C(T0.  □ 

The  theorem  shows  that  even  if  complete  syntactic  information  of  a  transaction  system  T  is  available  to  a 
scheduler,  SR(T)  is  the  maximum  possible  set  of  correct  schedules  a  scheduler  can  hope  to  produce.  After  all 
syntactic  information  is  the  information  one  can  easily  extract  in  a  transaction  system,  by  having  the  users 
declare  the  files  dial  they  intend  to  open,  say.  It  is  therefore  not  at  all  surprising  that  most  approaches  to 
concurrency  control  have  serialization  as  their  goal  [2, 8,  7, 1, 6J. 
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4.3  Optimal  Schedulers  for  Complete  Semantic  Information  but  Integrity 
Constraints 

Consider  the  transaction  system  of  Fig.  4-1. 

T  T 

*1  *2 

Tu:  x  *—  x+1  T21:  x  *—  x+1 

Tj2:  x «—  2*x 

Figure  4-1:  A  transaction  system. 

The  schedule  h  =  (Tu,  T21,  T^)  is  not  serializable  since  the  Herbrand  values  for  x  of  the  two  serial  histories 
are  f]2  (fn  (f21  (x)))  and  f21  (f12  (fu  (x))),  whereas  that  of  h  is  f12  (f21  (fu  (x))).  But  with  the  given 
interpretations  of  the  T’s,  h  is  seen  to  produce  the  same  state  as  the  serial  history  (T2r  Tn,  T12).  Hence,  our 
knowledge  of  the  interpretations  allows  us  to  expand  the  set  of  correct  schedules.  It  is  not  hard  to  see, 
however,  that  the  gains  are  delimited  by  a  generalized  notion  of  serialization,  defined  as  follows.  A  schedule  h 
is  said  to  be  weakly  serializable ,  if  starting  from  any  state  E  the  execution  of  the  schedule  will  end  with  a  state 
which  is  achievable  by  the  execution  of  some  concatenation  of  transactions  in  T,  possibly  with  repetitions  and 
omissions  of  transactions,  also  starting  from  state  E.  Since  transactions  are  assumed  to  be  correct,  a  weakly 
serializable  schedule  is  correct.  Denote  by  WSR(T)  the  set  of  all  weakly  serializable  schedules  of  T.  It  is  clear 
that  SR(T)  c  WSR(T).  A  weak  serialization  scheduler  is  defined  to  be  a  scheduler  S  satisfying  the  following 
property  for  any  T: 

S(H)  =  WSR(T)  and  P$  =  WSR(T). 

Theorem  4.3:  A  weak  serialization  scheduler  is  optimal  among  all  schedulers  using  all  information  but  the 
integrity  constraints. 

The  proof  is  quite  similar  to  the  proof  of  Theorem  4.2,  and  is  omitted. 

4.4  Optimal  Schedulers  for  Dynamic  Syntactic  Information 

So  far  we  have  implicitly  assumed  that  the  information  of  a  scheduler  about  a  transaction  system  is  static  in 
nature,  that  is,  prespecified  and  fixed.  We  now  consider  the  case  that  information  is  dynamic,  that  is,  the 
amount  of  information  available  to  a  scheduler  increases  as  the  scheduler  proceeds.  We  restrict  ourselves 
mainly  to  the  important  case  of  dynamic  syntactic  ir\formation. 

At  a  given  state  (J,  L,  G)  of  a  transaction  system  T,  the  dynamic  syntactic  information  available  to  a 
scheduler  is  the  complete  syntactic  information  on  all  transaction  steps  T.’s  with  1  £  i  £  n,  1  $  j  5  js  and  on 
those  Ty  j,  1  s  i  s  n,  which  are  pending  for  execution.  Thus,  the  set  I  corresponding  to  this  level  of 
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information  consists  of  all  transaction  systems  of  the  given  format  that  arc  syntactically  identical  to  the  one  at 
hand  up  to  the  specified  points.  We  can  define  by  a  straightforward  generalization  of  the  definition  of  Ps,  the 
fixpoint  set  PD  of  an  optimal  scheduler  that  uses  dynamic  syntactic  information.  By  Theorem  4.2,  we  know 
that  PD  must  be  contained  in  the  set  SR(T)  of  serializable  schedules  of  T.  Theorem  4.4  below  characterizes  PD 
exactly. 

Optimal  schedulers  for  dynamic  syntactic  information  are  closely  related  to  schedulers  that  are 
implemented  by  the  well-known  two-phase  locking  policy  [2],  which  is  defined  informally  as  follows,  (a)  If  a 
transaction  accesses  x  e  V,  then  there  is  a  lock  x  step  before  the  first  access  of  x  and  an  unlock  x  step  after  the 
last,  and  (b)  no  lock  step  appears  in  any  transaction  after  the  first  unlock  step.  Thus  each  transaction  has  two 
phases:  the  locking  phase,  during  which  no  locks  can  be  released,  and  the  unlocking  phase,  during  which  no 
locks  may  be  requested.  Notice  that  rules  (a)  and  (b)  do  not  uniquely  specify  the  positions  of  lock-unlock 
steps. 

A  two-phase  locking  scheduler  is  simply  a  scheduler  that  treats  transactions  as  though  they  were  locked 
according  to  some  version  of  the  two-phase  locking  policy.  The  fact  that  schedules  output  by  a  two-phase 
locking  scheduler  are  all  correct  follows  from  a  proof  in  [2J.  The  following  version  of  the  two-phase  locking 
policy  can  be  implemented  by  a  scheduler  using  dynamic  syntactic  information. 

A  strong  two-phase  locking  policy.  For  any  x  €  V,  lock  x  is  always  inserted  immediately  before  the  first 
access  of  x,  and  unlock  x  occurs  only  after  the  last  step  of  a  transaction  (or  immediately  before,  in  case  that  the 
last  step  does  not  update  x.) 

Theorem  4.4:  A  two-phase  locking  scheduler  corresponding  to  the  strong  two-phase  locking  policy  is 
optimal  among  all  schedulers  using  dynamic  syntactic  information. 

Proof:  Suppose  that  h  is  a  schedule  not  belonging  to  the  fixpoint  set  of  the  two-phase  locking  scheduler 
defined  in  the  theorem.  Then  there  must  exist  transaction  steps  in  h,  say  Ty  and  T^,  such  that 

•  Ty  is  not  the  last  step  of  transaction  Tlt  i.e.,  <  idj, 

•  T..  and  T„  access  the  same  variable  x,  and 

IJl  zj2 

•  these  steps  are  scheduled  in  h  in  the  order  T,. , ....  T^ . T^  ,  and  either  T^  updates  x,  or  T^ 

was  not  pending  when  T,.  was  scheduled  for  fexecutidn.  11  1 

We  can  construct  a  transaction  system  T  -  {Tr  T2},  compatible  with  the  syntactic  information  that  was 
available  at  the  moment  when  T,.  was  scheduled,  such  that  h  (  C(T).  Transaction  system  T  is  defined  as 
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0."  It  is  readily  seen  that 

□ 

Note  that  the  scheduler  in  Theorem  4.4  need  not  really  insert  lock's  and  unlock's  into  transactions,  as  it  can 
just  keep  track  the  first  occurrence  of  each  variable  in  each  transaction. 

If  a  scheduler  is  given  additional  dynamic  information,  i.e.,  (a)  the  read-completion  information  -  indicating 
the  earliest  point  that  a  transaction  has  read  all  the  global  variables  that  it  ever  wants  to  access,  and  (b)  the 
last-use  information  -•  indicating  for  each  global  variable  in  V  the  point  in  a  transaction  that  the  variable  is 
used  (read  or  written)  for  the  last  time,  then  the  scheduler  may  enjoy  higher  performance.  Using  the  read- 
completion  and  the  last-use  information  the  following  version  of  the  two-phase  locking  policy  can  be 
implemented  by  a  scheduler. 

A  weak  two-phase  locking  policy.  For  any  x  €  V,  lock  x  is  always  inserted  immediately  before  the  first  access 
of  x  and  unlock  x  occurs  as  early  as  possible,  as  long  as  the  two-phase  locking  requirement  is  still  maintained. 

Theorem  4.5:  A  two-phase  locking  scheduler  corresponding  to  the  weak  two-phase  locking  policy  is 
optimal  among  schedulers  using  dynamic  syntactic  information  plus  the  read-completion  and  the  last-use 
information. 

Proof:  Suppose  that  h  is  a  schedule  not  belonging  to  the  fixpoint  set  of  the  two-phase  locking  scheduler 

defined  in  the  theorem.  Then  there  must  exist  transaction  steps,  say  T..  and  T,  in  h,  such  that  (a)  these  steps 

1J1  /J2 

are  scheduled  for  execution  in  the  above  order,  (b)  T..  and  T,.  both  access  the  same  variable  x,  and  (c)  T,.  is 

All  ^2  All 

not  the  last  step  in  transaction  Tx  that  uses  x,  or  (c’)  T^  is  not  after  the  read-completion  point  for  Tr  For  the 
case  of  (c)  we  define  the  transaction  system  T  to  be  the  same  as  the  one  used  in  the  proof  of  Theorem  4.4.  For 
the  case  of  (c’),  we  define  the  transaction  system  T  -  {Tj,  T2}  to  be  such  that 

x  ♦—  x  +  1, 
y  4-  2  *  y, 

y  ♦-  y  + 1, 

x  4—  2  •  x, 

all  other  steps  arc  read  steps,  i.e.,  Ty:  xy  *-  xij(  and  the  integrity  constraints  is  "x  -  y."  We  see  that  die 
transaction  system  T  defined  in  either  case  is  compatible  to  the  syntactic  information  available  at  the  moment 
when  T2j  is  scheduled  for  execution  while  Tlm  is  not  yet  pending,  and  that  schedule  h  is  not  a  correct  one  for 
T.  1  □ 


T  • 

t^2*. 


1^2 


lm 


1 


■  X  +  1, 

2  •  x, 
x-  1, 


all  other  steps  arc  read  steps,  i.c.,  rI\.:  xy 


Xj.,  and  the  integrity  constraints  is  "x  = 


schedule  h  is  not  a  correct  one  for  T,  and  thus  the  theorem  follows. 
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